Tech Arsenal 1

home *** CD-ROM | disk | FTP | other *** search

/ Tech Arsenal 1 / Tech Arsenal (Arsenal Computer).ISO / tek-04 / ndx300.zip / NDX300.TXT < prev next >

Wrap

Text File | 1992-11-02 | 8KB | 211 lines

NDX -- BTREE INDEXING ROUTINES Version 3.00 Copyright 1987,88,89,92 by George H. Mealy, all rights reserved INTRODUCTION This package implements yet another variant of the balanced tree indexing method (Btree) described in section 6.2.4 of D. E. Knuth's "Sorting and Searching", volume 3 of "The Art of Programming." Its features are: 1. Files with variable length records can be indexed. Data file format is arbitrary, since an index is stored as a separate file. 2. Keys are variable length ASCIZ strings. The package allows retrieval on heads of index keys: when retrieving on the search key "FOO", records with keys "FOO", "FOO1", "FOOP" and "FOOPU" may be retrieved in that order. That is, a search key may be matched by any key of which it is a head. Matching can be case- sensitive or case-insensitive. Search behaviour is specified when the index is opened. 3. ISAM access to index entries is avail able; it may be first-in-first-out or last-in-last-out, as specified when the index file is created. 4. Index file records are cached. A single cache is used when more than one index file is open. 5. The source file is supplied. The package can be compiled using Turbo C version 1.5+ or Microsoft C version 5.0+. This package is copyrighted material. You may use it for noncommercial purposes and make exact and complete copies for others at cost. Modified versions of the package may NOT be copied without express permission. In any case, no warranty of any kind is made or implied by the author. The author may be reached at (617) 545-1727. INDEX FILE FORMAT The index file begins with a short header, padded with enough bytes to fill a 512 byte disk sector. (See NDX.H for the format of the header and other data structures.) Nodes in the tree are 1024 bytes in length on the disk. Each starts with a value -- that of the node if it is currently in use, or that of the next node in the chain of free nodes on the disk. The number of bytes occupied by keys in the node follows, and then the keys. Each KEY has three fields. First is the value, if not zero, of a lower level node which has keys less than or equal to the current key, followed by a reference to the place in the data file corresponding to the current key. Finally, the ASCIZ key itself is stored. Please note that only as many bytes as are necessary are stored. Immediately following the last key in the node, a pointer to the next lower level may be stored; the field end_keys of NODE does not include the length of this pointer. The index structure INDEX is allocated and initialized when the index file is opened. It contains: 1. A copy of the file header 2. A KEY structure which holds the key resulting from index file operations. 3. A stack holding the path from the root of the index to the current key entry. 4. The index file name and handle. 5. The string comparison function to be used during key retrieval. USAGE The constructor Index::Index(char *indexname, unsigned mode, CFN compfn) creates an empty index file. The comparison function, if NULL, defaults to strcmp. The mode is one of: NODUPS No duplicate keys allowed FIFO Duplicate keys are retrieved first-in-first-out. LIFO Duplicate keys are retrieved last-in-last out. The mode is stored in the index file header. A NULL second argument defaults to NODUPS. The third argument is the routine to be used for key matching -- if NULL, strcmp is used. Index::Index(char *indexname, int (*compfn)()) is used to open an existing index file, while the destructor closes an index file. DWORD Index::insert(char *key, DWORD value); adds a key to the index, except when the mode is NODUPS and the key is already present. In this case, the value specified in the call replaces the previous value. If successful, insert returns the specified DWORD and otherwise it returns zero. The value can be a long unsigned seek address used to reference the data file, or just a serial record number when the data file has fixed length records. A valid value may not be zero. The comparison function used for insertion is always strcmp. Index::remove(char *key, DWORD value); deletes the key from the index. If this leaves an empty node, the node is placed in the freelist. The second argument is required due to the possibility of multiple occurrences of the same key. Returns 0 for failure and 1 for success. DWORD Index::find(char *key); attempts to find the specified key, according to the comparison function specified when the file was opened and the mode speci fied when the file was created. The value part of the key found is returned on success; zero is returned on failure. In order to find all occurences of a key when FIFO or LIFO operation is in effect, use Index::find with a NULL second argument after the first occurence has been found. DWORD Index::first(unsigned last); positions the index file pointer to the first (last) key in the index. The "file pointer" is, strictly speaking, the top stack entry and contains the seek address of the node and the offset of the key in that node. The remainder of the stack records information required for traveling over the index tree starting from the file pointer, as if the index were flat rather than a tree. Zero is returned if the index is empty, else the value part of the key located is returned. DWORD Index::next(); DWORD Index::prev(); These routines change the file pointer to the next or previous key and return the value part of that key, or zero if the current file pointer was at the end or beginning of the index. Note that Index::find, Index::insert and Index::remove position the file pointer. In the latter case, the file pointer is set to the key following the deleted key. Note further that each successful key retrieval stores a copy of the current key structure in the index structure. For examples of the use of the routines, see the test program. IMPROVEMENTS Knuth suggests ways in which the Btree algorithms may be improved. If you decide to experiment, be prepared for an interesting [sic] experience. Be particularly wary of any changes you attempt to make in xdelete -- updating the index to properly reflect a deletion is an extremely tricky affair. Even xnext and xprev, which look deceptively simple, are easy to upset if you try to change the order in which things are done. CHANGES My address is: George H. Mealy 38 Gilson Road Scituate, MA 02066 USA (617) 545-1727 Version 2.02 corrected an egregious blunder in the Index::insert routine, thanks to John A. Matzen. I also fixed a bug in Index::find, thanks to strong type checking and use of the typedef CFN. The package has been recompiled using Borland C++ v3.0. The file NDX.DOC is a Word for Windows document.